Dataset Integration

Integration method: Linked Inference of Genomic Experimental Relationships (LIGER)(Welch et al. 2019)

LIGER Preprocessing

  1. Data were normalized for different numbers of UMIs per cell.

  2. Variable genes on each dataset were selected.

  3. Data were scaled by root-mean-square across cells.

  4. Cells/genes with no expression across any genes/cells were removed.

Key parameters:

  • Number of variable genes per dataset (individual) selected for integration: 3000

  • Total number of variable genes used for integration (the union across all individuals): 4850

Note: The length of the union across datasets (individuals) varied. Please check the Venn and UpSet plots below and make sure there is no outlier dataset(s).

Venn diagram of selected variable genes: The large numbers indicate how many variable genes are common between datasets. The datasets are represented by the numbers in the parentheses.

Upset chart of selected variable genes: The first 6 vertical bar charts show the sizes of isolated dataset participation to the total variable genes used for integration.

LIGER Factorization

  1. An integrative non-negative matrix factorization was performed in order to identify shared and distinct metagenes (factors) across the datasets.

  2. Corresponding factor/metagene loadings were performed for each cell.

Key parameters:

  • Number of Factors (inner dimension of factorization; k): 20

  • Penalty parameter which limits the dataset-specific component of the factorization (lambda): 5

  • Resolution parameter which controls the number of communities detected: 1

Dimension Reduction

Batch Effect Correction by LIGER

  1. Visualisation of the batch effect using tSNE plots.

  2. Quantification of the batch effect based on kBET(Büttner et al. 2019) test results. The rejection rate for each test represents the fraction of neighbourhoods with a label composition different from global composition of batch labels. A significantly different observed vs. expected rejection rate opposes the well-mixedness of the data.

• Covariate: diagnosis

• Covariate: manifest

• Covariate: sex

Clustering

Key parameters:


scFlow v0.4.2 – 2020-04-23 10:06:59

References

Büttner, Maren, Zhichao Miao, F. Alexander Wolf, Sarah A. Teichmann, and Fabian J. Theis. 2019. “A test metric for assessing single-cell RNA-seq batch correction.” Nature Methods 16 (1): 43–49. https://doi.org/10.1038/s41592-018-0254-1.

Welch, Joshua D., Velina Kozareva, Ashley Ferreira, Charles Vanderburg, Carly Martin, and Evan Z. Macosko. 2019. “Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity.” Cell 177 (7). Cell Press: 1873–1887.e17. https://doi.org/10.1016/j.cell.2019.05.006.

 

A report by scFlow